{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div align=\"right\" style=\"text-align: right\"><i>Peter Norvig, Oct 2017<br>pandas Aug 2020<br>Data updated monthly</i></div>\n",
    "\n",
    "# Bike Code\n",
    "\n",
    "Code to support the analysis in the notebook [Bike Speed versus Grade.ipynb](Bike%20Speed%20versus%20Grade.ipynb)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import HTML\n",
    "from typing import Iterator, Tuple, List, Dict\n",
    "import matplotlib\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy  as np\n",
    "import pandas as pd\n",
    "import re"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reading Data: `rides`\n",
    "\n",
    "I downloaded a bunch of my recorded [Strava](https://www.strava.com/athletes/575579) rides, most of them longer than 25 miles (with a few exceptions), as [`bikerides.tsv`](bikerides.tsv).  The columns are: the date; the year; a title; the elapsed time of the ride; the length of the ride in miles; and the total climbing in feet, e.g.: \n",
    "\n",
    "    Mon, 10/5\t2020\tHalf way around the bay on bay trail\t6:26:35\t80.05\t541\n",
    "    \n",
    "I parse the file into the pandas dataframe `rides`, adding derived columns for miles per hour, vertical feet climbed per hour (VAM), grade in feet per mile, grade in percent, and kilometers ridden:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def parse_hours(time: str) -> float: \n",
    "    \"\"\"Parse '4:30:00' => 4.5 hours.\"\"\"\n",
    "    while time.count(':') < 2: \n",
    "        time = '0:' + time\n",
    "    return round(pd.Timedelta(time).seconds / 60 / 60, 4)\n",
    "\n",
    "def parse_int(field: str) -> int: return int(field.replace(',', ''))\n",
    "\n",
    "def add_derived_columns(rides) -> pd.DataFrame:\n",
    "    return rides.assign(\n",
    "        mph=round(rides['miles'] / rides['hours'], 2),\n",
    "        vam=round(rides['feet'] / rides['hours']),\n",
    "        fpm=round(rides['feet']  / rides['miles']),\n",
    "        pct=round(rides['feet']  / rides['miles'] * 100 / 5280, 2),\n",
    "        kms=round(rides['miles'] * 1.609, 2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "rides = add_derived_columns(pd.read_table(open('bikerides.tsv'), comment='#',\n",
    "            converters=dict(hours=parse_hours, feet=parse_int)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reading Data: `segments`\n",
    "\n",
    "I picked some representative climbing segments ([`bikesegments.csv`](bikesegments.csv)) with the segment length in miles and climb in feet, along with several of my times on the segment. A line like\n",
    "\n",
    "    Old La Honda, 2.98, 1255, 28:49, 34:03, 36:44\n",
    "    \n",
    "means that this segment of Old La Honda Rd is 2.98 miles long, 1255 feet of climbing, and I've selected three times for my rides on that segment: the fastest, middle, and slowest of the times  that Strava shows. (However, I ended up dropping the slowest time in the charts to make them less busy.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "def parse_segments(lines):\n",
    "    \"\"\"Parse segments into rides. Each ride is a tuple of:\n",
    "    (segment_title, time,  miles, feet_climb).\"\"\"\n",
    "    for segment in lines:\n",
    "        title, mi, ft, *times = segment.split(',')[:5]\n",
    "        for time in times:\n",
    "            yield title, parse_hours(time), float(mi), parse_int(ft)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "segments = add_derived_columns(pd.DataFrame(\n",
    "               parse_segments(open('bikesegments.csv')), \n",
    "               columns='title\thours\tmiles\tfeet'.split()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reading Data: `places`\n",
    "\n",
    "Monthly, I will take my [summary data from wandrer.earth](https://wandrer.earth/athletes/3534/santa-clara-county-california) and enter it in the file [bikeplaces.txt](bikeplaces.txt), in a format where\n",
    "\n",
    "      Cupertino: 172: 22.1 23.9 26.2*3 26.3 | 26.4\n",
    "      \n",
    "means that Cupertino has 172 miles of roads, and that by the first month I started keeping track, I had ridden 22.1% of them; in the last month 26.4%; and the `26.2*3` means that for 3 months in a row I had 26.2%. The `|` indicates the end of a year. A line that starts with `#` is a comment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Month(int):\n",
    "    \"\"\"An integer in the form: 12 * year + month.\"\"\"\n",
    "    def __str__(self): return f'{(self - 1) // 12}-{(self % 12) or 12:02d}'\n",
    "\n",
    "start   = Month(2020 * 12 + 7) # Starting month: July 2020\n",
    "bonuses = (25, 90, 99)         # Percents the earn important bonuses\n",
    "\n",
    "Entry = Tuple[str, float, List[float]] # (Place_Name, miles_of_roads, [pct_by_month,...])\n",
    "\n",
    "def wandrer(category, entries, start=start):\n",
    "    \"\"\"Plot Wandrer.earth data.\"\"\"\n",
    "    fig, ax = plt.figure(), plt.subplot(111); \n",
    "    plt.plot()\n",
    "    for (place, miles, pcts), marker in zip(entries, '^v><osdhxDHPX*1234'):\n",
    "        N = len(pcts)\n",
    "        dates = [Month(start + i) for i in range(N)]\n",
    "        X = [dates[i] for i in range(N) if pcts[i]]\n",
    "        Y = [pcts[i]  for i in range(N) if pcts[i]]\n",
    "        ax.plot(X, Y, ':', marker=marker, label=label(pcts, place, miles))\n",
    "    all_pcts = [p for _, _, pcts in entries for p in pcts if p]\n",
    "    for p in bonuses: \n",
    "        if min(all_pcts) < p < max(all_pcts):\n",
    "            ax.plot(dates, [p] * N, 'k:', lw=1, alpha=3/4) # Plot bonus line\n",
    "    ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), shadow=True,\n",
    "              prop=matplotlib.font_manager.FontProperties(family='monospace'))\n",
    "    plt.xticks(dates, [str(d) for d in dates], rotation=90)\n",
    "    plt.ylabel('Percent of Area Ridden')\n",
    "    plt.title(category); plt.tight_layout(); grid(axis='y'); plt.show()\n",
    "    \n",
    "def label(pcts, place, miles) -> str:\n",
    "    pct = f'{rounded(pcts[-1]):>3}' if pcts[-1] > 1.4 else f'{pcts[-1]}'\n",
    "    done = miles * pcts[-1]\n",
    "    bonus = next((f' {rounded((p - pcts[-1]) / 100 * miles):>3} to {p}%' \n",
    "                  for p in bonuses if p >= pcts[-1]), '')\n",
    "    return f'{pct}% ({rounded(done / 100):>3}/{rounded(miles):<3} mi){bonus} {place}'\n",
    "    \n",
    "def parse_places(lines) -> Dict[str, List[Entry]]:\n",
    "    \"Parse bikeplaces.txt into a dict of {'Title': [entry,...]}\"\n",
    "    places = {}\n",
    "    category = None\n",
    "    for line in lines:\n",
    "        line = line.strip()\n",
    "        if line.startswith('#') or not line: \n",
    "            pass\n",
    "        elif line.startswith(':'):\n",
    "            title = line.strip(':')\n",
    "            places[title] = []\n",
    "        else:\n",
    "            places[title].append(parse_entry(line))\n",
    "            places[title].sort(key=lambda entry: -entry[-1][-1])\n",
    "    return places\n",
    "    \n",
    "def parse_entry(line: str) -> Entry:\n",
    "    \"\"\"Parse line => ('Place Name', miles, [percents]); '=' can be used.\"\"\"\n",
    "    if line.count(':') != 2:\n",
    "        print('bad', line)\n",
    "    place, miles, pcts = line.replace('|', ' ').split(':')\n",
    "    pcts = re.sub('( [0-9.]+)[*]([0-9]+)', lambda m: m.group(1) * int(m.group(2)),\n",
    "                  pcts).split()\n",
    "    for i, p in enumerate(pcts):\n",
    "        pcts[i] = pcts[i - 1] if p == '=' else 100 if p == '100' else float(p)\n",
    "    return place, float(miles), pcts \n",
    "                   \n",
    "def rounded(x: float) -> str: return f'{round(x):,d}' if x > 10 else f'{x:.1f}'\n",
    "\n",
    "def wandering(places: dict):\n",
    "    \"Plot charts of unique roads ridden in various places.\"\n",
    "    for category in places:\n",
    "        wandrer(category, places[category])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "places = parse_places(open('bikeplaces.txt'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "def update_places(filename='bikeplaces.txt'):\n",
    "    return ''.join(map(update_line, open(filename)))\n",
    "\n",
    "def update_line(line):\n",
    "    words = line.split()\n",
    "    if not words or words[0].startswith(':'):\n",
    "        pass\n",
    "    elif '*' in words[-1]:\n",
    "        m, d = words[-1].split('*')\n",
    "        words[-1] = m + '*' + str(int(d) + 1)\n",
    "    else:\n",
    "        words[-1] += '*2'\n",
    "    return ' '.join(words) + '\\n'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Eddington Number"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "def Ed_number(distances) -> int:\n",
    "    \"\"\"Eddington number: The maximum integer e such that you have bicycled \n",
    "    a distance of at least e on at least e days.\"\"\"\n",
    "    distances = sorted(distances, reverse=True)\n",
    "    return max(e for e, d in enumerate(distances, 1) if d >= e)\n",
    "\n",
    "def Ed_gap(distances, target) -> int:\n",
    "    \"\"\"The number of rides needed to reach an Eddington number target.\"\"\"\n",
    "    return target - sum(distances > target)\n",
    "\n",
    "def Ed_progress(years=range(2013, 2022), rides=rides) -> pd.DataFrame:\n",
    "    \"\"\"A table of Eddington numbers by year, and a plot.\"\"\"\n",
    "    def Ed(year, d): return Ed_number(rides[rides['year'] <= year][d])\n",
    "    data  = [(y, Ed(y, 'kms'), Ed(y, 'miles')) for y in years]\n",
    "    frame = pd.DataFrame(data, columns=['year', 'Ed_km', 'Ed_mi'])\n",
    "    frame.plot('year', ['Ed_km', 'Ed_mi'], style='o:',\n",
    "               title='Eddington Numbers in kms and miles')\n",
    "    grid(axis='y')\n",
    "    return frame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Plotting and Curve-Fitting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.rcParams[\"figure.figsize\"] = (10, 6)\n",
    "\n",
    "def show(X, Y, data, title='', degrees=(2, 3)): \n",
    "    \"\"\"Plot X versus Y and a best fit curve to it, with some bells and whistles.\"\"\"\n",
    "    grid(); plt.ylabel(Y); plt.xlabel(X); plt.title(title)\n",
    "    plt.scatter(X, Y, data=data, c='grey', marker='+')\n",
    "    X1 = np.linspace(min(data[X]), max(data[X]), 100)\n",
    "    for degree in degrees:\n",
    "        F = poly_fit(data[X], data[Y], degree)\n",
    "        plt.plot(X1, [F(x) for x in X1], '-')\n",
    "    \n",
    "def grid(axis='both'): \n",
    "    \"Turn on the grid.\"\n",
    "    plt.minorticks_on() \n",
    "    plt.grid(which='major', ls='-', alpha=3/4, axis=axis)\n",
    "    plt.grid(which='minor', ls=':', alpha=1/2, axis=axis)\n",
    "    \n",
    "def poly_fit(X, Y, degree: int) -> callable:\n",
    "    \"\"\"The polynomial function that best fits the X,Y vectors.\"\"\"\n",
    "    coeffs = np.polyfit(X, Y, degree)[::-1]\n",
    "    return lambda x: sum(c * x ** i for i, c in enumerate(coeffs)) \n",
    "\n",
    "estimator = poly_fit(rides['feet'] / rides['miles'], \n",
    "                   rides['miles'] / rides['hours'], 2)\n",
    "\n",
    "def estimate(miles, feet, estimator=estimator) -> float:\n",
    "    \"\"\"Given a ride distance in miles and total climb in feet, estimate time in minutes.\"\"\"\n",
    "    return 60 * miles / estimator(feet / miles)\n",
    "\n",
    "def top(frame, field, n=20): return frame.sort_values(field, ascending=False).head(n)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
